Skip to content

gh-148259: make list.remove atomic for non-trivial __eq__ comparison#148440

Draft
KowalskiThomas wants to merge 4 commits intopython:mainfrom
KowalskiThomas:kowalski/bug-prevent-race-condition-in-list-remove
Draft

gh-148259: make list.remove atomic for non-trivial __eq__ comparison#148440
KowalskiThomas wants to merge 4 commits intopython:mainfrom
KowalskiThomas:kowalski/bug-prevent-race-condition-in-list-remove

Conversation

@KowalskiThomas
Copy link
Copy Markdown
Contributor

@KowalskiThomas KowalskiThomas commented Apr 12, 2026

What is this PR?

This PR addresses the bug reported in #148259 where calling rich comparison may result in releasing the GIL which may result in list.remove removing the wrong item from the list.

The PR also updates the tests added for #126033.

Copy link
Copy Markdown
Contributor

@colesbury colesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KowalskiThomas KowalskiThomas force-pushed the kowalski/bug-prevent-race-condition-in-list-remove branch from 858e505 to c30ea14 Compare April 13, 2026 21:21
Copy link
Copy Markdown
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The raises parameter is now misleading. Please correct it. In addition, please add tests to list itself. And please also update the docs if they need to.

Overall, I'm not convinced by this change. Mutating the list while doing the comparison is unsafe. We just don't want to crash. But making it raise a RuntimeError does not necessarily makes it better. Otherwise, we need to do it for all other mutable sequences in C to have more or less the same behavior.

cc @rhettinger

@bedevere-app
Copy link
Copy Markdown

bedevere-app Bot commented Apr 13, 2026

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@KowalskiThomas
Copy link
Copy Markdown
Contributor Author

The raises parameter is now misleading. Please correct it. In addition, please add tests to list itself. And please also update the docs if they need to.

Yes, I had to stop in the middle of my work earlier -- I didn't plan to leave it in that state, sorry.

Otherwise, we need to do it for all other mutable sequences in C to have more or less the same behavior.

I think some sequences already raise RuntimeError when incompatible things are done to them concurrently, right?

@KowalskiThomas
Copy link
Copy Markdown
Contributor Author

KowalskiThomas commented Apr 14, 2026

I just had the time to give this another look. Updated the tests again and now everything passes I believe.

EDIT: just realised you also asked for tests on list; I'll add them later today!

@picnixz
Copy link
Copy Markdown
Member

picnixz commented Apr 14, 2026

I think some sequences already raise RuntimeError when incompatible things are done to them concurrently, right?

Yes but if it is not always consistent. I see this as a garbage-in/garbage-out case. Do not mutate your list while doing other things concurrently. We prefer not raising and leaving it as a GIGO rather than adding complexity and changing existing behaviors.

We do fix cases where there hard crashes (segfaults) but ortherwise we try to retain existing behaviors. Anyway, I think we need first to discuss this, perhaps even standardizing this into a PEP, that is, make sure that concurrent mutations consistently raise whatever collection you are using (not just lists or dicts) and how.

@picnixz
Copy link
Copy Markdown
Member

picnixz commented Apr 14, 2026

Yes, I had to stop in the middle of my work earlier -- I didn't plan to leave it in that state, sorry.

In this case, please make it a draft.

@KowalskiThomas KowalskiThomas marked this pull request as draft April 17, 2026 22:16
@KowalskiThomas
Copy link
Copy Markdown
Contributor Author

Anyway, I think we need first to discuss this, perhaps even standardizing this into a PEP, that is, make sure that concurrent mutations consistently raise whatever collection you are using (not just lists or dicts) and how.

This makes sense to me. I admit I didn't check what other collections do outside deque (which does raise RuntimeError in case it is mutated concurrently); someone on the original issue did for set and dict (which from what I can tell properly handle those cases).
I'll look more into this and maybe check what the process for a PEP is.

I see this as a garbage-in/garbage-out case. Do not mutate your list while doing other things concurrently. We prefer not raising and leaving it as a GIGO rather than adding complexity and changing existing behaviors.

While I generally understand the GIGO case/rationale, I beg to differ on that one.

The way this can "currently" (with the GIL) happen is a bit convoluted and unlikely to happen in a real life scenario (custom equality operator releasing the GIL at the same time as another thread resumes and mutates the list), so I wouldn't worry too much about it.

However, with the GIL disabled, I can see it happening more commonly (it really just takes two threads working on the same list at the same time, I believe?).
On top of that, many operations on list are atomic in practice (and PEP-703 ensures they stay so without the GIL), so as a normal user who doesn't know anything about how list (or CPython, for that matter) works internally, I would be very confused if sometimes my list.remove didn't remove the right item when all the other list operations always work as expected.

It would seem (at least to me?) much more helpful to get an exception loudly saying that I'm doing something wrong than to have my code randomly fail to do what I want it to every once in a while (which I could potentially only realise because some downstream result of that list.remove isn't behaving as expected, making reproducing and debugging the issue very complicated).

What is your take on that? Happy to be told so if I'm missing part of the picture here, of course.

@picnixz
Copy link
Copy Markdown
Member

picnixz commented Apr 18, 2026

My take is:

  • equality is not meant to mutate the list, whether it is with or without the GIL.
  • legitimate use cases that make it unsage without the GIL must be fixed
  • convoluted ones are GIGO. There is no point in making it slower for legitimate use cases just to protect unlikely cases.

@KowalskiThomas
Copy link
Copy Markdown
Contributor Author

Thanks for the answer. I completely agree with what you're saying; equality shouldn't mutate -- definitely, convoluted use cases are GIGO -- yes, legitimate use cases should be fixed -- absolutely, and that is actually why I'm pushing a bit on this one. I believe this is a bug that real people could encounter.

As we've already discussed, this limitation isn't introduced by free threading, it's only made worse by it. The problem can happen, even today, if (1) one thread is calling list.remove (2) another one is mutating the list (3) the items in the list have a custom equality operator that always (or sometimes) releases the GIL or interrupts the CS more generally.

I have a use case that I do believe is legitimate and not convoluted, and that could happen to real users. The bug can happen when using Django (which is pretty widespread) with its ORM (which is also common), for ORM objects that have a custom __eq__ that uses a deferred field (which also is common practice), because the custom __eq__ may result in a database call which will obviously break the CS.

The following script which runs a Django app/reproducer that does exactly this, live on an SQLite database.

from __future__ import annotations

import os
import sys
import tempfile
import threading
import time
from typing import Any

import django
from django.conf import settings
from django.apps import apps
from django.db import connection, models


assert not sys._is_gil_enabled()

_DB_FD, _DB_PATH = tempfile.mkstemp(prefix="django_race_", suffix=".sqlite3")
os.close(_DB_FD)


settings.configure(
    DEBUG=False,
    DATABASES={
        "default": {
            "ENGINE": "django.db.backends.sqlite3",
            "NAME": _DB_PATH,
        },
    },
    INSTALLED_APPS=["__main__"],
    USE_TZ=True,
    DEFAULT_AUTO_FIELD="django.db.models.AutoField",
)
django.setup()


class Person(models.Model):
    email = models.CharField(max_length=100, unique=True)
    name = models.CharField(max_length=100)

    class Meta:
        app_label = "__main__"

    # Business-key __eq__: same shape as Django's stock Model.__eq__, but
    # compares on ``email`` (a deferred, non-pk field).
    def __eq__(self, other: object) -> bool:
        if not isinstance(other, Person):
            return NotImplemented
        if self._meta.concrete_model != other._meta.concrete_model:
            return False
        return self.email == other.email # if not fetched yet, will fetch

    def __hash__(self) -> int:
        # __hash__ must be defined because __eq__ is overridden.
        return hash(("Person", self.email))


apps.register_model("__main__", Person)


def setup_database() -> None:
    with connection.schema_editor() as se:
        se.create_model(Person)

    Person.objects.bulk_create([
        Person(email="target@example.com", name="Target"),
        Person(email="sentinel@example.com", name="Sentinel"),
        Person(email="intruder@example.com", name="Intruder"),
    ])


ITERATIONS: int = 2000
INSERTER_DELAY_S: float = 0.0005


def run_once() -> dict[str, Any]:
    # Load target with `email` deferred x accessing .email will SELECT.
    target = Person.objects.defer("email").get(name="Target")
    sentinel = Person.objects.get(name="Sentinel")
    intruder = Person.objects.get(name="Intruder")

    assert "email" not in target.__dict__, "email should be deferred"

    lst: list[Person] = [target, sentinel]

    def inserter() -> None:
        # sleep a bit to avoid starting and finishing immediately
        time.sleep(INSERTER_DELAY_S)
        lst.insert(0, intruder)

    t = threading.Thread(target=inserter)
    t.start()

    try:
        try:
            # Query-side object; its `email` is already populated (constructed
            # in-memory, not fetched with defer()), so only the target's
            # email access goes to the DB.
            lst.remove(Person(email="target@example.com", name="Target"))
        except ValueError:
            pass
    finally:
        t.join()

    pk_values = [p.pk for p in lst]
    remaining_names = [p.name for p in lst]
    # Correct behaviour: Target is gone, remaining names == ["Intruder", "Sentinel"]
    # Buggy behaviour:   Intruder was removed at index 0 while Target's comparison
    #                    was still in flight, remaining names == ["Target", "Sentinel"]
    race_detected = any(p.name == "Target" for p in lst)
    return {
        "race_detected": race_detected,
        "remaining": remaining_names,
        "pks": pk_values,
    }


def main() -> None:
    print(f"Python {sys.version}")
    print(f"Django {django.__version__}")
    print(f"SQLite DB: {_DB_PATH}")
    setup_database()

    print(f"Running {ITERATIONS} iterations …\n")
    try:
        races = 0
        for i in range(ITERATIONS):
            result = run_once()
            if result["race_detected"]:
                races += 1
                print(f"  [iteration {i+1}] list.remove deleted the wrong Person; remaining = {result['remaining']}")
    finally:
        try:
            os.unlink(_DB_PATH)
        except OSError:
            pass

if __name__ == "__main__":
    main()

Running this on a free threaded version Python (might also happen otherwise) triggers the race at almost every run. I don't think this is a really contrived example -- it's something real people could definitely do.


On top of that, I have seen your concern on performance and I agree with there clearly is a tradeoff between making everything correct and making 99.99% of usage fast. I think with the current proposal, performance doesn't really take a hit. I made a pyperf benchmark comparing the two implementations (before and after my changes), and the performance difference is well within the noise. Also note this is for the most trivial case where equality is on integers; using any custom equality operator, the overhead of it would probably make the added logic even less important.

from __future__ import annotations

import pyperf
from typing import Any


SIZES: list[int] = [10, 100, 1000, 10_000]
POSITIONS: list[str] = ["start", "middle", "end"]


def _target_index(size: int, position: str) -> int:
    if position == "start":
        return 0
    if position == "middle":
        return size // 2
    if position == "end":
        return size - 1
    raise ValueError(position)


def bench_remove(loops: int, size: int, position: str) -> float:
    lists: list[list[int]] = [list(range(size)) for _ in range(loops)]
    target: int = _target_index(size, position)

    t0: float = pyperf.perf_counter()
    for lst in lists:
        lst.remove(target)
    return pyperf.perf_counter() - t0


def main() -> None:
    runner: Any = pyperf.Runner()
    runner.metadata["description"] = (
        "list.remove() across varying sizes and target positions."
    )

    for size in SIZES:
        for position in POSITIONS:
            name: str = f"list.remove size={size} pos={position}"
            runner.bench_time_func(name, bench_remove, size, position)


if __name__ == "__main__":
    main()

The results of the benchmark are the following...

Benchmark Without check With check Delta
size=10 pos=start 20.9 ns 21.2 ns +1.4%
size=10 pos=middle 47.6 ns 47.8 ns +0.4%
size=10 pos=end 71.0 ns 71.5 ns +0.7%
size=100 pos=start 35.6 ns 34.1 ns -4.2%
size=100 pos=middle 282 ns 280 ns -0.7%
size=100 pos=end 525 ns 525 ns 0%
size=1000 pos=start 198 ns (noisy) 259 ns (noisy) NA
size=1000 pos=middle 2.60 us 2.57 us -1.2%
size=1000 pos=end 5.03 us 4.95 us -1.6%
size=10000 pos=start 4.83 us (noisy) 5.72 us (noisy) NA
size=10000 pos=middle 27.6 us 27.3 us -1.1%
size=10000 pos=end 53.0 us 52.4 us -1.1%

... so in short the latency difference is well within noise and probably shouldn't be too much of a concern.

@KowalskiThomas KowalskiThomas requested a review from picnixz April 21, 2026 21:57
@picnixz
Copy link
Copy Markdown
Member

picnixz commented Apr 22, 2026

one thread is calling list.remove (2) another one is mutating the list (3) the items in the list have a custom equality operator that always (or sometimes) releases the GIL or interrupts the CS more generally.

In this case, it is already a GIGO. You should use a lock otherwise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants